home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-03-10 | 10.9 KB | 226 lines | [TEXT/MPS ] |
- .TH ANTLR 1 "April 1994" "ANTLR" "PCCTS Manual Pages"
- .SH NAME
- antlr \- ANother Tool for Language Recognition
- .SH SYNTAX
- .LP
- \fBantlr\fR [\fIoptions\fR] \fIgrammar_files\fR
- .SH DESCRIPTION
- .PP
- \fIAntlr\fP converts an extended form of context-free grammar into a
- set of C functions which directly implement an efficient form of
- deterministic recursive-descent LL(k) parser. Context-free grammars
- may be augmented with predicates to allow semantics to influence
- parsing; this allows a form of context-sensitive parsing. Selective
- backtracking is also available to handle non-LL(k) and even
- non-LALR(k) constructs. \fIAntlr\fP also produces a definition of a
- lexer which can be automatically converted into C code for a DFA-based
- lexer by \fIdlg\fR. Hence, \fIantlr\fR serves a function much like
- that of \fIyacc\fR, however, it is notably more flexible and is more
- integrated with a lexer generator (\fIantlr\fR directly generates
- \fIdlg\fR code, whereas \fIyacc\fR and \fIlex\fR are given independent
- descriptions). Unlike \fIyacc\fR which accepts LALR(1) grammars,
- \fIantlr\fR accepts LL(k) grammars in an extended BNF notation \(em
- which eliminates the need for precedence rules.
- .PP
- Like \fIyacc\fR grammars, \fIantlr\fR grammars can use
- automatically-maintained symbol attribute values referenced as dollar
- variables. Further, because \fIantlr\fR generates top-down parsers,
- arbitrary values may be inherited from parent rules (passed like
- function parameters). \fIAntlr\fP also has a mechanism for creating
- and manipulating abstract-syntax-trees.
- .PP
- There are various other niceties in \fIantlr\fR, including the ability to
- spread one grammar over multiple files or even multiple grammars in a single
- file, the ability to generate a version of the grammar with actions stripped
- out (for documentation purposes), and lots more.
- .SH OPTIONS
- .IP "\fB-ck \fIn\fR"
- Use up to \fIn\fR symbols of lookahead when using compressed (linear
- approximation) lookahead. This type of lookahead is very cheap to
- compute and is attempted before full LL(k) lookahead, which is of
- exponential complexity in the worst case. In general, the compressed
- lookahead can be much deeper (e.g, \f(CW-ck 10\fP) than the full
- lookahead (which usually must be less than 4).
- .IP \fB-CC\fP
- Generate C++ output from both ANTLR and DLG.
- .IP \fB-cr\fP
- Generate a cross-reference for all rules. For each rule, print a list
- of all other rules that reference it.
- .IP \fB-ct\fP
- Do not make copies of tokens passed to the parser in C++ mode
- (default=to copy). When using DLG in conjunction with ANTLR, you will
- always want ANTLR to make copies because DLG only has space for one
- \f(CWANTLRToken\fP (which is passed to the scanner with
- \f(CWsetToken\fP); this address is always returned and, hence, without
- copies, all $-variables would point to the same \f(CWANTLRToken\fP.
- .IP \fB-e1\fP
- Ambiguities/errors shown in low detail (default).
- .IP \fB-e2\fP
- Ambiguities/errors shown in more detail.
- .IP \fB-e3\fP
- Ambiguities/errors shown in excruciating detail.
- .IP "\fB-fe\fP file"
- Rename \fBerr.c\fP to file.
- .IP "\fB-fh\fP file"
- Rename \fBstdpccts.h\fP header (turns on \fB-gh\fP) to file.
- .IP "\fB-fl\fP file"
- Rename lexical output, \fBparser.dlg\fP, to file.
- .IP "\fB-fm\fP file"
- Rename file with lexical mode definitions, \fBmode.h\fP, to file.
- .IP "\fB-fr\fP file"
- Rename file which remaps globally visible symbols, \fBremap.h\fP, to file.
- .IP "\fB-ft\fP file"
- Rename \fBtokens.h\fP to file.
- .IP \fB-ga\fP
- Generate ANSI-compatible code (default case). This has not been
- rigorously tested to be ANSI XJ11 C compliant, but it is close. The
- normal output of \fIantlr\fP is currently compilable under both K&R,
- ANSI C, and C++\(emthis option does nothing because \fIantlr\fP
- generates a bunch of #ifdef's to do the right thing depending on the
- language.
- .IP \fB-gc\fP
- Indicates that \fIantlr\fP should generate no C code, i.e., only
- perform analysis on the grammar.
- .IP \fB-gd\fP
- C code is inserted in each of the \fIantlr\fR generated parsing functions to
- provide for user-defined handling of a detailed parse trace. The inserted
- code consists of calls to the user-supplied macros or functions called
- \fBzzTRACEIN\fR and \fBzzTRACEOUT\fP. The only argument is a
- \fIchar *\fR pointing to a C-style string which is the grammar rule
- recognized by the current parsing function. If no definition is given
- for the trace functions, upon rule entry and exit, a message will be
- printed indicating that a particular rule as been entered or exited.
- .IP \fB-ge\fP
- Generate an error class for each non-terminal.
- .IP \fB-gh\fP
- Generate \fBstdpccts.h\fP for non-ANTLR-generated files to include.
- This file contains all defines needed to describe the type of parser
- generated by \fIantlr\fP (e.g. how much lookahead is used and whether
- or not trees are constructed) and contains the \fBheader\fP action
- specified by the user.
- .IP \fB-gk\fP
- Generate parsers that delay lookahead fetches until needed. Without
- this option, \fIantlr\fP generates parsers which always have \fIk\fP
- tokens of lookahead available. This option is incompatible with
- \fB-pr\fP and renders references to \fBLA(\fIi\fB)\fR invalid as
- one never knows when the \fIith\fP token of lookahead will be fetched.
- .IP \fB-gl\fP
- Generate line info about grammar actions in C parser of the form
- \fB#\ \fIline\fP\ "\fIfile\fP"\fR which makes error messages from
- the C/C++ compiler make more sense as they will \*Qpoint\*U into the
- grammar file not the resulting C file. Debugging is easier as well,
- because you will step through the grammar not C file.
- .IP "\fB-gp \fIprefix\fR"
- Prefix all functions generated from rules with \fIprefix\fP. This is now
- obsolete. Use the \*Q#parser "name"\*U \fIantlr\fP directive.
- .IP \fB-gs\fR
- Do not generate sets for token expression lists; instead generate a
- \fB||\fP-separated sequence of \fBLA(1)==\fItoken_number\fR. The
- default is to generate sets.
- .IP \fB-gt\fP
- Generate code for Abstract-Syntax Trees.
- .IP \fB-gx\fP
- Do not create the lexical analyzer files (dlg-related). This option
- should be given when the user wishes to provide a customized lexical
- analyzer. It may also be used in \fImake\fR scripts to cause only the
- parser to be rebuilt when a change not affecting the lexical structure
- is made to the input grammars.
- .IP "\fB-k \fIn\fR"
- Set k of LL(k) to \fIn\fR; i.e. set tokens of look-ahead (default==1).
- .IP "\fB-o\fP dir
- Directory where output files should go (default="."). This is very
- nice for keeping the source directory clear of ANTLR and DLG spawn.
- .IP \fB-p\fP
- The complete grammar, collected from all input grammar files and
- stripped of all comments and embedded actions, is listed to
- \fBstdout\fP. This is intended to aid in viewing the entire grammar
- as a whole and to eliminate the need to keep actions concisely stated
- so that the grammar is easier to read. Hence, it is preferable to
- embed even complex actions directly in the grammar, rather than to
- call them as subroutines, since the subroutine call overhead will be
- saved.
- .IP \fB-pa\fP
- This option is the same as \fB-p\fP except that the output is
- annotated with the first sets determined from grammar analysis.
- .IP \fB-pr\fP
- Obsolete\ \(em used to turn on use of predicates in parsing decisions
- in release 1.06. Now, in 1.10, the specification of a predicate
- implies that it should be used. When a syntactic ambiguity is
- discovered, \fIantlr\fP searches for predicates that can be used to
- disambiguate the decision. Predicates have dual roles as semantic
- validation and disambiguation predicates.
- .IP "\fB-prc on\fR
- Turn on the computation and hoisting of predicate context.
- .IP "\fB-prc off\fR
- Turn off the computation and hoisting of predicate context. This
- option makes 1.10 behave like the 1.06 release with option \fB-pr\fR
- on. Context computation is off by default.
- .IP "\fB-rl \fIn\fR
- Limit the maximum number of tree nodes used by grammar analysis to
- \fIn\fP. Occasionally, \fIantlr\fP is unable to analyze a grammar
- submitted by the user. This rare situation can only occur when the
- grammar is large and the amount of lookahead is greater than one. A
- nonlinear analysis algorithm is used by PCCTS to handle the general
- case of LL(k) parsing. The average complexity of analysis, however, is
- near linear due to some fancy footwork in the implementation which
- reduces the number of calls to the full LL(k) algorithm. An error
- message will be displayed, if this limit is reached, which indicates
- the grammar construct being analyzed when \fIantlr\fP hit a
- non-linearity. Use this option if \fIantlr\fP seems to go out to
- lunch and your disk start thrashing; try \fIn\fP=10000 to start. Once
- the offending construct has been identified, try to remove the
- ambiguity that \fIantlr\fP was trying to overcome with large lookahead
- analysis. The introduction of (...)? backtracking blocks eliminates
- some of these problems\ \(em \fIantlr\fP does not analyze alternatives
- that begin with (...)? (it simply backtracks, if necessary, at run
- time).
- .IP \fB-w1\fR
- Set low warning level. Do not warn if semantic predicates and/or
- (...)? blocks are assumed to cover ambiguous alternatives.
- .IP \fB-w2\fR
- Ambiguous parsing decisions yield warnings even if semantic predicates
- or (...)? blocks are used. Warn if predicate context computed and
- semantic predicates incompletely disambiguate alternative productions.
- .IP \fB-\fR
- Read grammar from standard input and generate \fBstdin.c\fP as the
- parser file.
- .SH "SPECIAL CONSIDERATIONS"
- .PP
- \fIAntlr\fP works... we think. There is no implicit guarantee of
- anything. We reserve no \fBlegal\fP rights to the software known as
- the Purdue Compiler Construction Tool Set (PCCTS) \(em PCCTS is in the
- public domain. An individual or company may do whatever they wish
- with source code distributed with PCCTS or the code generated by
- PCCTS, including the incorporation of PCCTS, or its output, into
- commercial software. We encourage users to develop software with
- PCCTS. However, we do ask that credit is given to us for developing
- PCCTS. By "credit", we mean that if you incorporate our source code
- into one of your programs (commercial product, research project, or
- otherwise) that you acknowledge this fact somewhere in the
- documentation, research report, etc... If you like PCCTS and have
- developed a nice tool with the output, please mention that you
- developed it using PCCTS. As long as these guidelines are followed,
- we expect to continue enhancing this system and expect to make other
- tools available as they are completed.
- .SH FILES
- .IP *.c
- output C parser
- .IP *.C
- output C++ parser when C++ mode is used
- .IP \fBparser.dlg\fP
- output \fIdlg\fR lexical analyzer
- .IP \fBerr.c\fP
- token string array, error sets and error support routines
- .IP \fBremap.h\fP
- file that redefines all globally visible parser symbols. The use of
- the #parser directive creates this file
- .IP \fBstdpccts.h\fP
- list of definitions needed by C files, not generated by PCCTS, that
- reference PCCTS objects. This is not generated by default.
- .IP \fBtokens.h\fP
- output \fI#defines\fR for tokens used and function prototypes for
- functions generated for rules
- .SH "SEE ALSO"
- .LP
- dlg(1), pccts(1)
-